Inner TRIM3 Masthead

Statistical Match Procedure Used in the 2010 and Later Baselines

The statistical match procedure used beginning with the 2010 baseline is an unconstrained nearest neighbor match similar to that used in 2007-2009, but with changes to the treatment of high-income units. Prior to matching, the CPS and PUF are divided into mutually exclusive groups that only allow matching within each respective group. The groups are defined by the following "blocking variables":

  • filing status - whether the taxpayer files a joint or non-joint return
  • Social Security receipt - whether the tax unit receives Social Security income
  • dependent children - the number of dependent children of the taxpayer living within the household (none, one, or two or more)
  • dependency status - whether the taxpayer can be claimed as a dependent on another return
Certain block groups are collapsed--tax units with Social Security income are subdivided into blocks defined by filing status (joint/non joint) but are not differentiated by dependent status or number of dependent children, and dependent returns without Social Security income are treated a single block (i.e., they are not differentiated by filing status and because they cannot themselves claim dependents, there is no additional blocking by number of dependent children).

Several additional constraints are imposed on the matching algorithm that have the effect of reducing the number of PUF records that are potential matches to a particular TRIM3 record. These constraints relate to:

  • Capital Gains and Transfer Program Recipients. The statistical match does not assign capital gains to tax units receiving SSI, TANF, public or subsidized housing, or food stamp benefits.
  • Home Ownership. A TRIM3 tax unit must own a house in order to be matched with a PUF tax unit that claims itemized deductions for home mortgage interest expenses or real estate taxes.
  • Presence of Mortgage. Since the 2010 calendar year, the CPS has information on whether a tax unit owns a home with or without a mortgage. A TRIM3 tax unit must report having a mortgage in order to be matched with a PUF tax unit that claims itemized deductions for home mortgage interest expenses.
  • State and Local Tax Deductions. A TRIM3 tax unit in a state without a state income tax can only be matched to a PUF record claiming the state and local income tax deduction if the PUF tax unit is also in a state without a state income tax.
  • Adjustments for Keogh/SEP Contributions. A TRIM3 tax unit must have business or farm self-employment income in order to be matched with an PUF tax unit that claims adjustments to income for contributions to Keogh or SEP retirement accounts.
  • Child and Dependent Care Expenses. A TRIM3 tax unit must have qualifying child care expenses to be matched to a PUF tax unit that claims child and dependent care expenses.
  • PUF Variables Exceeding Prescribed Levels. A single large value for an PUF variable can produce skewed results if the PUF record in question represents only a few tax units but is matched to a CPS tax record representing many tax units. To avoid this problem, the match procedure disallows matches to PUF records with very large values for certain variables.
  • Earner/Non-Earner Status. The PUF record and TRIM3 tax unit must have the same earner/non-earner status. A tax unit is classified as an "earner" if total wages, business, and farm income is non-zero.
  • Asset Related Income. If the TRIM3 tax unit has asset related income (interest, rent and royalties, or dividends) then it is only matched to a PUF return with asset related income.
  • Pension Income. If the CPS tax unit has pension income, then it is only matched to a PUF return with pension income. If the CPS tax unit does not have pension income and the PUF return has pension income, then it is only matched to the PUF record if the head of the CPS tax unit is age 55 or older.

The 2010 baseline changed the 2005-2009 practice of using the PUF to restore variation to top-coded CPS incomes. In previous years, the Census Bureau top-coded income amounts exceeding certain thresholds in order to preserve confidentiality, and replaced top-coded amounts with averages calculated for all top-coded individuals. However, in the 2011 CPS, the Census Bureau adopted a new procedure of "rank proximity swapping," in which individuals with income amounts above a threshold value have their amounts "swapped" with the value of another high-income person within a bounded interval. This ensures that no high-income person record contains the exact income data of that person, while preserving the distribution of values above the threshold. Because this change to the data removes the need to restore variation, we altered our statistical match procedure to no longer import income values from the PUF; CPS income variables are now used throughout the baseline. Although it is no longer necessary to substitute top-coded CPS income variables with income variables from the PUF, we continue to create five clones of high income households in order to allow greater variation in capital gains and deductions obtained through the statistical match with the PUF.

Once the match procedure has identified the set of PUF records that can be matched to a given CPS tax unit, a PUF record is selected using a "minimum distance" function. This procedure varies between units that are "high income" (that is, with one or more income amounts above the threshold for rank proximity swapping) and lower income units. If the tax unit is not treated as a "high income" unit for the purpose of the match, then the distance function is computed based on AGI. Capital gains and IRA and Keogh contributions are obtained from the PUF record being considered for the match. The capital gains are added (and IRA and Keogh contributions are subtracted) from the preliminary AGI calculated by TRIM3. The resulting AGI is compared to the AGI of the available PUF records and the record with the least difference in AGI is selected.

For "high income" tax units, the minimum distance function is computed by examining the difference between the CPS tax unit and the PUF record for each of ten income items reported on both the CPS and the PUF (wages, business income, farm income, interest, pensions, dividends/estates/trusts, rents/royalties, total social security benefits, unemployment compensation, and alimony received). The PUF record with the least absolute difference across these income items is selected as a match.

Once a PUF record has been selected, variables from that record are assigned to the CPS tax unit. The weight of the PUF record is then reduced by the weight of the CPS tax unit. Once the weight for a PUF record has been reduced to zero, it cannot be matched to additional CPS tax units.

Because the variables obtained through the statistical match for an individual tax unit are obtained from a single PUF record, we are limited in our ability to align any specific variable to target. However, we do make some adjustments. We adjust the capital gains and deduction dollar amounts to reflect the change in average dollar amounts between the year of the PUF data and the tax year being simulated, and we make minor adjustments to increase or decrease the likelihood of selecting a PUF record based on whether the record has income or deduction values from particular sources (such as capital gains). We also perform some minimal alignment by adjusting the dollar amounts used to disallow matches to PUF records with very large income or deduction amounts.

The 2010 baseline used the 2007 PUF, and the 2011-2013 baselines used the 2008 PUF.